Genetic profiling of different phenotypic subsets of breast cancer stem cells (BCSCs) in breast cancer patients

Background Breast cancer stem cells (BCSCs) have a crucial role in breast carcinogenesis, development, and progression. The aim of the current study is to characterize the BCSCs through the genetic profiling of different BCSCs phenotypic subsets to determine their related genetic pathways. Methods Fresh tumor tissue samples were obtained from 31 breast cancer (BC) patients for (1) Mammosphere culture. (2) Magnetic separation of the BCSCs subsets using CD24, CD44, and CD326 Microbeads. (3) Flow cytometry (FCM) assay using CD44, CD24, and EpCAM. (4) RT-PCR profiler Arrays using stem cell (SC) panel of 84 genes for four group of cells (1) CD44+/CD24−/EpCAM− BCSCs, (2) CD44+/CD24− /EpCAM+ BCSCs, (3) mammospheres, and (4) normal breast tissues. Results The BCSCs (CD44+/CD24−/EpCAM−) showed significant downregulation in 13 genes and upregulation in 15, where the CD44, GJB1 and GDF3 showed the maximal expression (P = 0.001, P = 0.003 and P = 0.007); respectively). The CD44+/CD24−/EpCAM+ BCSCs showed significant upregulation in 28 genes, where the CD44, GDF3, and GJB1 showed maximal expression (P < 0.001, P = 0.001 and P = 0.003; respectively). The mammospheres showed significant downregulation in 9 genes and a significant upregulation in 35 genes. The maximal overexpression was observed in GJB1 and FGF2 (P = 0.001, P = 0.001; respectively). The genes which achieved significant overexpression in all SC subsets were CD44, COL9A1, FGF1, FGF2, GDF3, GJA1, GJB1, GJB2, HSPA9, and KRT15. While significant downregulation in BMP2, BMP3, EP300, and KAT8. The genes which were differentially expressed by the mammospheres compared to the other BCSC subsets were CCND2, FGF3, CD4, WNT1, KAT2A, NUMB, ACAN, COL2A1, TUBB3, ASCL2, FOXA2, ISL1, DTX1, and DVL1. Conclusion BCSCs have specific molecular profiles that differ according to their phenotypes which could affect patients’ prognosis and outcome.

disease with a multifactorial etiology. It is divided into distinct pathological subtypes including ductal, lobular, and mucinous carcinomas. Also, it has variable molecular characteristics according to estrogen receptor (ER), progesterone receptor (PR) expression, and HER2 amplification. In addition, it had been classified according to the transcriptome-based classifications into luminal and basal breast cancers [2][3][4].
Despite the variability of the treatment modalities and diagnostic tools available for BC patients, still there is an increased incidence of metastasis and adverse outcomes. Therefore, it is important to understand the underlying molecular mechanisms involved in the carcinogenesis and progression of BC [5].
Breast cancer stem cells (BCSCs) represent a subpopulation of tumor cells that possess the ability to self-renew, divide indefinitely, and differentiate into other types of cells according to the surrounding growth factors [6]. There is accumulating evidence proposed that BCSCs are the leading cause of cancer progression, metastasis, as well as resistance against antitumor chemo/ radio or hormonal therapy [7,8]. The BCSCs are characterized by surface markers expression of CD44 + /CD24 −/low , as well as mammosphere formation [5]. The mammospheres can be developed by culturing in non-adherent non-differentiating culture conditions, which allow for the promotion of cells that are capable of survival and continuous proliferation in culture as discrete spherical clusters [9]. These mammosphere culture systems are used to identify and enrich putative BCSCs.
The CD44 is a non-kinase cell surface glycoprotein that binds to hyaluronic acid (HA) and mediates the interaction of the BCSCs and the surrounding matrix metalloprotease (MMP) and osteopontin (OPN) [10,11]. Therefore, CD44 is important for the stemness properties of the cancer cells, as well as the regulation of cell proliferation, differentiation, and survival [12,13].
CD24 is a glycosylphosphatidylinositol-linked cell surface glycoprotein, that inhibits chemokine receptor-4 (CXCR4), and regulates cell metastasis and proliferation [14,15]. It is usually downregulated on the surface of BCSCs, however, its expression on BCSCs is associated with adverse outcomes in the luminal A and triple-negative BC (TNBC) subtypes [16]. The CD24 and CD326 (EpCAM) are the main surface marker expressed on the surface of the mammary stem cell (MaSCs). The MaSCs are normally present in the adult mammary gland, and they are responsible for the maintenance of the ductal architecture [17]. These cells were also identified as proliferative heterogeneous stem cells/progenitors in the luminal types of breast cancer [18]. While the CD24 −/ low CD44 + BCSCs were more commonly enriched in the basal-like subtype and less frequent in the luminal types [19]. The BC is characterized by a high degree of intratumor heterogeneity, as a single tumor may contain BCSCs with different phenotypes according to the molecular forms of the tumor [20,21]. Therefore, the characterization of the BCSCs should not be relay on CD44 + CD24 −/ low only [5]. Other markers can be used for the characterization of BCSCs including the expression of the surface markers e.g., ALDH1, Prominin-1 (CD133), and CD131, their ability to form spheroid culture, as well as the expression of different molecular markers involved in maintaining the self-renewal, differentiation and stemness properties of the BCSCs [22,23]. These signaling pathways include Notch, Wnt/β-catenin, Hedgehog (Hh), TNF-α/NF-Kβ, transforming growth factor-β (TGF-β), receptor tyrosine kinase (RTK), and Janus kinase/signal transducer and activator of transcription (JAK-STAT) pathways [24,25].
Therefore, the aim of the current study is to characterize the BCSCs by genetic profiling of different BCSCs phenotypic subsets, and determination of their related genetic pathways. This will allow us to accurately define the possible impact of BCSCs on the development and progression of the BC, as well as their contribution to patients' response to treatment, outcomes, and survival rates. Hence it will open a new avenue for potential targeted therapy in BC patients.

Methods
This is a retrospective cohort study included 31 patients who were histo-pathologically confirmed for BC. The study was conducted at the National Cancer Institute (NCI), Cairo University during the period from January 2019 to May 2021. All patients were subjected to full history taking, full clinical examination, complete laboratory, and radiological assessment. The normal control samples were obtained from the females who underwent reduction mammoplasty at the NCI surgical unit.

Sample collection
Fresh tumor samples were obtained from the operation theatre in a sterile, 50 ml plastic Falcon tube containing 10 ml of Dulbecco's modified Eagle's medium (DMEM). The samples were transferred immediately to the tissue culture lab for processing. A section of the tumor was sent to the Pathology department for routine histopathology and immunohistochemistry [Estrogen receptors (ER), progesterone receptors (PR), Herceptin-2 receptors (Her-2), and Ki-67] work to confirm the diagnosis.

Isolation of breast cancer cells
The neoplastic tissues were washed several times in Hanks Balanced Salt Solution (HBSS; Invtrogen) and minced with sterile blades into very small pieces (0.2-0.5 mm each). The single-cell suspension was obtained by enzymatic digestion using collagenase (50-100 units/ml in HBSS; lnvitrogen) according to the studied protocol. The cells were incubated for 4-18 h at 37 ℃, and then filtered using a sterile stainless steel or nylon mesh. The cell suspension was washed several times by centrifugation in HBSS, and then the pellets were re-suspended in 500 µl-1 ml of Dulbecco's Modified Eagle Medium (DMEM). Using the haemocytometer, the cells were counted and divided into two parts, one part was used for mammosphere culture, and the other part was used for cellular characterization by FCM using CD44, CD24 and cytokeratin or EpCAM monoclonal antibodies.

Mammosphere culture
The mammosphere culture was performed according to the method of Dontu et al. [26] with modifications [27,28]. Briefly, the single isolated breast cancer cells were suspended in ultra-low attachment plates at a density of 4 × 10 5 viable cells/mL in primary culture and 1000 cells/ mL in each passage. The cells were cultured in DMEM/ Ham F-12 media (1:1) supplemented with insulin (5 mg/ mL), hydrocortisone (0.5 mg/ml), and epidermal growth factor (20 ng/mL; all from Invitrogen ltd., Paisley, Scotland). The cells were then seeded into six-well plates (2.5 ml/per plate) or T25 tissue culture flasks (5 ml per flask). The non-adherent cells were fed weekly; measured using the gridded lens. The mammospheres were enzymatically dissociated every 7 days to 2 weeks by incubation in 0.5% trypsin-EDTA solution (Invitrogen) for about 5-10 min at 37 ℃, then dispersed by pipetting with a 23-gauge needle. During the mammosphere dissociation, a subset of cells from each passage was subjected to subsequent morphologic evaluation by microscopic examination. In addition to immunohistochemistry assessment using primary antibodies for pan CK, CD44, CD24, and CD133 (Abcam, UK).

Characterization of breast cancer cells by flow cytometry
A portion of the cell suspension was used for flow cytometric characterization of breast cancer stem cells using the Cell Quest program for the following conjugated antibodies: CD45-FITC (lymphocyte marker), CD24-PE (cancer stem cell & epithelial marker), CD44-FITC (cancer stem cell marker), pan cytokeratin or cytokeratin 19-PE (epithelial marker) or EpCAM-PE (epithelial marker) according to manufacturers' instructions (Becton & Dickinson, R&D, Milteny). Appropriate isotype controls were included in all cases to determine the areas of non-specific staining and unstained cells from each sample were also analyzed as a negative control. Accordingly, five subsets of cells were identified in each stained cases: CD44 + / CK − or EpCAM − cells, CD44 + / CK + or EpCAM + cells, CD44 − /CK + or EpCAM + cells, CD44 + / CD24 −/low cells and CD24 + cells.

Separation of the breast cancer stem cells (BCSCs)
After the preparation of single-cell suspension, magnetic separation of the BCSCs subsets was done using the LS separation column (Miltenyi Biotec B.V. & Co. KG) according to the manufacturers' instructions. The cells were separated by magnetic selection after staining with CD24, CD44 and CD326 (EpCAM) Micro Beads labeled with monoclonal antibodies. Finally, different subsets of cells were collected by magnetic separation and stored for subsequent RNA extraction. Accordingly, the cells were divided into three groups including G1: CD44 + / CD24 − /EpCAM − cells; G2: CD44 + /CD24 −/ /EpCAM + cells; and G3: mammospheres,

Gene profiling array
RNA extraction and quantitative real-time PCR (qRT-PCR) RNA was extracted and purified from the different groups of cells after magnetic separation using RNeasy Midi Kit (Cat. No. 74104, Qiagen) according to manufacturers' instructions. The qRT-PCR was done using the RT2 profiler array (Cat. No. 330401, Qiagen). As for the stem cells (SCs) profiling assay, the SABiosciences RT2 qPCR Master Mixes (Cat. No. 330522, Qiagen) was used to obtain the most accurate results from the PCR Array.
The PCR was performed in the MaxPro3000 real time PCR (Startagen). Regarding the 96 well plate array the following reagents were mixed in a 5-ml tube at the recommended concentrations using a multi-channel pipette: 2X SABiosciences RT2 qPCR Master Mix, diluted First Strand cDNA Synthesis Reaction (using 500 ng RNA) and H 2 O. The amplification cycles were formed of an initiation step at 95 ℃ for 10 m, followed by 40 cycles at 95 ℃ for 15 s and, 55 ℃ for 90 s. The cycle threshold (Ct) for each well was determined and the ΔΔCt method was used for data analysis by the instrument's software. The Ct values of the control wells were determined including the Ct value of genomic DNA Control (GDC) and if it was greater than 35, the level of genomic DNA contamination was considered too low to affect gene expression profiling results. The studied genes were illustrated in Table 1.

Data analysis
Data management and analysis were performed using statistical software package SPSS, version 22 (IBM, Armonk, Ny, USA). The flow cytometry data were presented as median and interquartile ranges (IQR) according to the performed normality test. Comparison between data were analyzed using Mann-Whitney. The PCR Array Data Analysis Web Portal presents the results in a tabular format, a scatter plot, a three-dimensional profile, and a volcano plot. All tests of hypotheses were performed at the alpha level of 0.05, with a 95% confidence interval.

Patients' characteristics
The median age of the recruited BC females was 47 (range; 22-68) years, and the mean age was 48.1 ± 11.4 years. There were 15 (48.4%) patients with grade 2 tumor, and 16 (51.6%) patients with grade 3. Lymph node (LN) involvement was encountered in 22 (71%) patients, and capsular invasion was detected in 18 (58.1%) BC patients. The ER and PR were expressed in the tumor tissue of 14 (45.2%) BC females, while HER2 was expressed in the tumor tissue of 7 (22.6%) patients. In addition, there were 8 (25.8%) patients positive for distant metastasis ( Table 2).

Mammosphere culture of the primary breast cancer cells
The mammospheres formed of cells that are capable of surviving and proliferating as discrete clusters in nonadherent, non-differentiating culture conditions. Such spheroids, which are enriched in progenitor cells capable of differentiating along multiple lineages. The size of the mammospheres depends upon the proliferation of the cells which varied according to the severity and the aggressiveness of the disease.
Viable mammospheres were produced in 13 out of the 31 cases included in the study (cultures were feasible for 15 cases only). In these cases, the mammospheres ranged in size from 20 to 180 µm and were successfully cultured past the third passage ( Fig. 1). Immunohistochemistry using lineage markers were performed to cell blocks  obtained from mammospheres after the second passage as well as after full differentiation.
There was a significant association between CD44 + / CD24 low/− expression and the ability of the tumor to metastasize, as the expression of CD44 + /CD24 low/− in patients with distant metastasis was 61% (range: 7-80%), compared to 25.3% (range: 0.1-68%) in those who had not metastasize (P = 0.038). Notably, patients with increased CD44 + expression, showed increase incidence of metastasis, ER and PR expression, though it did not reach a significant level (P = 0.071, 0.059 and 0.059; respectively, Table 3).
Regarding the differential gene expression in the mammospheres, there were a significant downregulation in 9 genes which involved in cell cycle regulation (CDC42,     However, the genes which achieved significant over expression in all studied SC subsets were CD44, COL9A1, FGF1, FGF2, GDF3, GJA1, GJB1, GJB2, HSPA9, and KRT15. While those which significantly downregulated were BMP2, BMP3, EP300, and KAT8.

Discussion
Breast cancer is a heterogenous disease characterized by variable genetical and phenotypical subtypes, which accordingly leads to diverse outcomes in BC patients. This heterogeneity of the BC cells is mainly linked to the cell of origin [29]. An increasing body of evidence reported that BC could be developed from dysregulation of the mammary stem cells. In the current study Cells in orange color denoted up-regulated genes, cells in green color denoted downregulated genes, cells in blue color denoted differentially significant expression, p-value is significant if <0.05. *indicated the significantly expressed genes in the assessed three groups of cells.
Our data showed also that CD44 + /CD24 − /EpCAM + BCSCs are more aggressive and tumorgenic than the CD44 + /CD24 − /EpCAM − BCSCs denoted by the differential expression of genes involved in Wnt and Notch pathway, as well as the increased expression of mesenchymal, embryonic, and neural cell lineage markers. These findings are consistent with Luo et al. [6], who reported that the BCSCs CD44 + /CD24 − positive for EpCAM showed increased incidence of treatment resistance and tumor recurrence. Similarly, Al-Hajj et al. [34], concluded that the BCSC (CD44 + /CD24 − ) which express EpCAM on their cell surface, were more tumorigenic and had potent invasive properties when transplanted in immunodeficient mice in comparison to those lacking the expression of EpCAM.
Thus, the increased expression of EpCAM on the surface of BCSCs confers an aggressiveness, metastasis, drug resistance, and tumorigenic properties for the BC, which consequently associated with poorer outcomes in the BC patients.
Furthermore, both groups of BCSC subsets (CD44 + / CD24 − /EpCAM + and CD44 + /CD24 − /EpCAM-) showed that the maximal expression was observed in CD44, followed by GDF3 (Growth differentiation factor-3) and GJB1 (Gap junction beta-1 protein). These data are in line with many series reported that CD44 has a fundamental role in EMT as it acts as an adhesion molecule and receptor for extracellular glycosaminoglycan hyaluronic acid, which leads to increased cell motility, and tumorigenicity [6,35,36]. This finding was confirmed by our results showing that CD44 expression was significantly increased in BC patients with distant metastasis.
Regarding the analysis of the gene expression profile of the cultured mammosphers, the current data demonstrated that the genes which were differentially overexpressed by the mammosheres when compared to the other BCSC subsets (CD44 + /CD24 − /EpCAM + and CD44 + /CD24 − /EpCAM-), were those involved in cell cycle regulation (CCND2, FGF3), cell adhesion molecules (CD4), Wnt pathway (WNT1), Notch pathway (KAT2A, NUMB), mesenchymal cell lineage markers (ACAN, COL2A1), neural cell lineage markers (TUBB3), and embryonic cell lineage markers (ASCL2, FOXA2, ISL1). While there was a differential downregulation in DTX1, DVL1 which were involved in the Notch pathway. Theses exclusively upregulated genes in mammosphers confirmed the nature of the mammospher derived cells which allow for the selection of highly undifferentiated and aggressive cells with marked stemness properties. In line with these data, Dontu and his colleagues performed microarray analysis of mammosphers and other differentiated BC cells. They observed a significant upregulation of genes involved in growth hormone receptors, thrombin receptors, and Notch signaling pathway [26]. Another study done by Ramalho-Santos et al. also concluded the upregulation of Jak/Stat signaling, Notch signaling, increased transporter activity, DNA repair genes, growth hormone and thrombin receptors in mammospher derived cells [37].
Moreover, our data revealed that the maximal overexpressed genes in the mammosphers were the GJB1 followed by FGF2, JAG1 and COL9A1 compared to the control group. These results are consistent with many previously published studies proposed that GJB1 and JAG1 associated significantly with the development of distant metastasis [38], while COL9A1 and FGF2 mutation associated with drug resistance in breast cancer patients [39,40].

Conclusions
The current study provided evidence that BCSCs have specific molecular profiles that differ according to their phenotypes, which could affect patients' prognosis and outcome. CD44 is an important marker for characterization of BCSCs, and its co-expression with EpCAM provides an aggressive and metastatic properties for the BCSCs rather than those with CD44 + /CD24 − /EpCAM − phenotype. The mammosheres are the most aggressive type of breast cancer cells which exclusively had its own molecular profile including CCND2, FGF3, CD4, WNT1, KAT2A, NUMB, ACAN, COL2A1, TUBB3 ASCL2, FOXA2, ISL1, DTX1, and DVL. These genes could be considered as molecular markers for aggressiveness, metastasis, and resistance to treatment. Additionally, our data provided a panel of 14 genes (CD44, COL9A1, FGF1, FGF2, GDF3, GJA1, GJB1, GJB2, HSPA9, KRT15, BMP2, BMP3, EP300, and KAT8) which were expressed (up or downregulated) in all the assessed subsets of BCSCs which could serve as molecular markers with a potential diagnostic, prognostic and/or predictive value for breast cancer patients. However, the data of the profiling array should be validated on a larger number of patients, and also it should be correlated to the patients' clinical courses in the form of distant metastasis, response to treatment and survival rates.